NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Stronger Models are Not Always Stronger Teachers for Instruction Tuning

Xu, Zhangchen; Jiang, Fengqing; Niu, Luyao; Lin, Bill Yuchen; Poovendran, Radha (April 2025, Proc. of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies)

Instruction tuning has been widely adopted to ensure large language models (LLMs) follow user instructions effectively. The resulting instruction-following capabilities of LLMs heavily rely on the instruction datasets used for tuning. Recently, synthetic instruction datasets have emerged as an economically viable solution to provide LLMs diverse and high-quality instructions. However, existing approaches typically assume that larger or stronger models are stronger teachers for instruction tuning, and hence simply adopt these models as response generators to the synthetic instructions. In this paper, we challenge this commonly-adopted assumption. Our extensive experiments across five base models and twenty response generators reveal that larger and stronger models are not necessarily stronger teachers of smaller models. We refer to this phenomenon as the Larger Models’ Paradox. We observe that existing metrics cannot precisely predict the effectiveness of response generators since they ignore the compatibility between teachers and base models being fine-tuned. We thus develop a novel metric, named as CompatibilityAdjusted Reward (CAR) to measure the effectiveness of response generators. Our experiments across five base models demonstrate that CAR outperforms almost all baselines.
more » « less
Full Text Available
ChatBug: A Common Vulnerability of Aligned LLMs Induced by Chat Templates

Jiang, Fengqing; Xu, Zhangchen; Niu, Luyao; Lin, Bill Yuchen; Poovendran, Radha (February 2025, The 39th Annual AAAI Conference on Artificial Intelligence)

Large language models (LLMs) are expected to follow in- structions from users and engage in conversations. Tech- niques to enhance LLMs’ instruction-following capabilities typically fine-tune them using data structured according to a predefined chat template. Although chat templates are shown to be effective in optimizing LLM performance, their impact on safety alignment of LLMs has been less understood, which is crucial for deploying LLMs safely at scale. In this paper, we investigate how chat templates affect safety alignment of LLMs. We identify a common vulnerability, named ChatBug, that is introduced by chat templates. Our key insight to identify ChatBug is that the chat templates provide a rigid format that need to be followed by LLMs, but not by users. Hence, a malicious user may not necessar- ily follow the chat template when prompting LLMs. Instead, malicious users could leverage their knowledge of the chat template and accordingly craft their prompts to bypass safety alignments of LLMs. We study two attacks to exploit the ChatBug vulnerability. Additionally, we demonstrate that the success of multiple existing attacks can be attributed to the ChatBug vulnerability. We show that a malicious user can exploit the ChatBug vulnerability of eight state-of-the- art (SOTA) LLMs and effectively elicit unintended responses from these models. Moreover, we show that ChatBug can be exploited by existing jailbreak attacks to enhance their at- tack success rates. We investigate potential countermeasures to ChatBug. Our results show that while adversarial train- ing effectively mitigates the ChatBug vulnerability, the vic- tim model incurs significant performance degradation. These results highlight the trade-off between safety alignment and helpfulness. Developing new methods for instruction tuning to balance this trade-off is an open and critical direction for future research.
more » « less
Full Text Available
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition

https://doi.org/10.18653/v1/2020.acl-main.752

Lin, Bill Yuchen; Lee, Dong-Ho; Shen, Ming; Moreno, Ryan; Huang, Xiao; Shiralkar, Prashant; Ren, Xiang (July 2020, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics)

Full Text Available
AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging

Lin, Bill Yuchen; Lee, Dong-Ho; Xu, Frank F; Lan, Ouyu; Ren, Xiang (January 2019, Proceedings of the 57th Conference of the Association for Computational Linguistics)

Full Text Available
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Srivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Shoeb, Abu Awal; Abid, Abubakar; Fisch, Adam; Brown, Adam R.; Santoro, Adam; Gupta, Aditya; Garriga-Alonso, Adri; et al (January 2023, Transactions on machine learning research)

Full Text Available

Search for: All records